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Abstract 

The study of social networks, and in particular the spread of disease on networks, has attracted consider- 
able recent attention in the physics community. In this paper, we show that a large class of standard epidemi- 
ological models, the so-called susceptible/infective/removed models, and many of their generalizations, can 
be solved exactly on a wide variety of networks. Solutions are possible for cases with heterogeneous or 
correlated probabilities of transmission, cases incorporating vaccination, and cases in which the network 
has complex structure of various kinds. We confirm the correctness of our solutions by comparison with 
computer simulations of epidemics propagating on the corresponding networks. 
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Networks of various kinds have been the subject of much recent research within the physics 
community [[T|, 0]. Social networks 0, |p, technological networks 0], and biological net- 
works [|S|, have all been examined and modeled in some detail. Most work has focussed on 
structural properties of the networks in question — patterns of connection between people, comput- 
ers, species, and so forth. Structure, however, while important, is in most cases only a prerequisite 
to answering the question of real interest: what is the behavior of networked systems? One area 
in which some progress has been made towards answering this question is the study of the spread 
of disease. Recent simulation studies and approximate analytical treatments suggest that network 
structure can play a crucial role in defining the nature of a disease epidemic [ [TOt [TT] , [T2| ]. 

In this paper, we show that the most fundamental standard model of disease propagation, the 
SIR model, and a large set of its generalized forms, are exactly solvable on a broad class of 
networks, including networks with social or community structure of various kinds (e.g., networks 
in which people are distinguished by different roles that they play). Our solutions provide exact 
criteria for deciding when an epidemic will occur, how many people will be affected, and how 
the network structure or the transmission properties of the disease could be modified in order to 
prevent the epidemic. 

The SIR model [|T3p is a model of disease propagation in which a population is divided into 
three classes: susceptible (S), meaning they are free of the disease but can catch it, infective (I), 
meaning they have the disease and can pass it on to others, and removed (R), meaning they have 
recovered from the disease or died, and can not longer pass the disease on. There is a fixed 
probability per unit time that an infective individual will pass the disease to a susceptible individual 
with whom they have contact, rendering that individual infective. Individuals who contract the 
disease remain infective for a certain time period before recovering (or dying) and thereby losing 
their infectivity. 

To turn this process into a complete model of disease spread we also need to know the pattern 
of contacts between individuals. In the standard treatments, and indeed in most of mathematical 
epidemiology, researchers use the so-called "fully mixed" approximation, in which it is assumed 
that every individual has equal chance of contact with every other. This is an unrealistic assump- 
tion, but it has proven popular because it allows one to write differential equations for the time 
evolution of the disease that can be solved or numerically integrated to determine the course of 
an epidemic. More realistic versions of the model have also been studied in which populations 
are divided into groups according to age or other characteristics. The models are still fully mixed 
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within each group however. In the real world, the pattern of contacts between individuals is far 
from fully mixed, forming a social network with well-defined structure. Here, therefore, we aban- 
don the fully mixed approximation and turn our attention instead to models in which the social 
network is explicitly represented. 

In this paper we also abandon two other unrealistic assumptions of the usual SIR model, the 
assumptions that all contacts between individuals represent equal probability of disease transmis- 
sion, and that all individuals who catch the disease remain infective for the same amount of time. 
We will allow the probability per unit time r,j of transmission from an infective individual i to a 
susceptible individual j to be drawn from any arbitrary distribution P{r). We will also allow the 
time X for which individuals remain infective to be drawn from any arbitrary distribution P{t). 
These generalizations increase the range (and realism) of models to which our solutions are appli- 
cable. 

Consider then a network of initially susceptible individuals represented by the vertices of a 
graph. The edges of the graph represent connections between individuals by which disease can be 
transmitted. These connections might represent, for example, periodic physical proximity — two 
people working in the same building perhaps, or living in the same house. 

The crucial observation that makes our solutions possible is that SIR epidemic processes are 
equivalent to (generalized) bond percolation processes on the corresponding network of individu- 
als and contacts. This correspondence appears first to have been pointed out by Grassberger for the 
case of the simple SIR model with fixed probabilities of infection and times of infectiveness [jl^. 
More recently, it has been observed numerically that the correspondence extends also to the case of 
variable probabilities and times [ |T5| ] . In fact, it is straightforward to show that the above general- 
ized SIR process on a network corresponds to bond percolation on the same network with uniform 
bond occupation probability 



The quantity T, which we call the transmissibility of the disease, lies in the range < T < 1 and 
represents the average total probability that a susceptible individual will catch the disease from an 
infective contact. Our solutions for SIR models are derived by combining this mapping to perco- 
lation with a generating function technique similar to that introduced by Moore and Newman 

One of the most important results to come out of recent work on networks is the finding that 
the degree distributions of many networks are highly right-skewed. (Recall that the "degree" of 
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a vertex is the number of other vertices to which it is connected.) In other words, most vertices 
have only a low degree, but there are a small number whose degree is very high [Q, ^, ^. It 
is known that the presence of these highly connected vertices can have a disproportionate effect 
on certain properties of the network. Recent work suggests that the same may be true for disease 
propagation on networks [ [T^ [T7| ], and so it will be important that we incorporate non-trivial degree 
distributions in our models. As a first illustration of our method therefore, we look at a simple class 



of unipartite graphs studied previously by a number of authors [|T8|, |19[ g^, in which the degree 
distribution is specified, but the graph is in other respects random. 

Suppose that the probability of a randomly chosen vertex in our graph having degree k is p^. 
We define two generating functions 

oo -j oo 

Go{x) = £ pkx\ Gi (x) = - £ kpkx''-\ (2) 

where z = Gq(1) is the mean vertex degree in the network. These two functions generate respec- 
tively the probability distributions of the degrees of randomly chosen vertices, and vertices at the 
ends of randomly chosen edges. Not all edges leading from a vertex will be occupied however 
(i.e., result in transmission of the disease). The distribution of the number m of occupied edges 
around a randomly chosen vertex is generated by 
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Goix;T) = £ l^Pk['jT-{l-T 

m=0 k=m 
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= UPklli ){xTni-T)'-'-=l,Pk{l-T + xT)'^ 

k=0 m=Q\^/ k=Q 

= Go{l + {x-l)T). (3) 

And similarly the number around the vertex at the end of a randomly chosen edge is generated 
by Gi{x;T) = Gi{ \ + {x — l)T) . Now the generating function Hi{x;T) for the total number of 
people infected as a result of a single transmission along an edge in the network must satisfy a 
self-consistency condition of the form [ [161 ^ 

Hi{x;T)=xGi{Hi{x;Ty,T). (4) 

And the distribution of the number of people affected by an outbreak starting with a single disease 
carrier is generated by 

Hq{x-T)=xGq{Hi{x;T)-T). (5) 
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The average size (s) of a disease outbreak is then given by the derivative of Hq with respect 
to x: 

is) = H[,{\-T) = l+G'o{hT)H[{l;T) 

G'JUT) TG'M) 
= 1 J 0^ ' ^ = 1 J 01j_ ("6) 

i-G;(i;r) ^i-rG;(i)' 

where we have made use of Eq. (Q) and the fact that all generating functions are 1 at x = 1 if 
the distributions that they generate are properly normalized. Eq. @ diverges when T is equal 
to the critical value Tc = l/G[{l), and this point marks the onset of epidemic behavior. For 
transmissibilities below this epidemic threshold, T < Tc, all outbreaks are finite in size, no matter 
how large the network, and the probability of any given individual being affected by an outbreak 
is zero in the limit of large graph size. For T > Tc there is always a finite chance of infection. The 
fraction of the population that is infected in an epidemic outbreak can be derived by observing that 
above Tc, Eq. generates the size distribution of outbreaks excluding epidemics [^^, and hence 
the size S of the epidemic is given by the solution of 

S=\ — Go{u;T), u = Gi{u;T). (7) 

Unfortunately, it is not usually possible to find a closed form solution to this last equation, but it 
can be solved numerically by iteration from a suitable starting value of u. 

Note that it is not the case, even above Tc, that all outbreaks give rise to epidemics of the 
disease. There are still finite outbreaks even in the epidemic regime, and the probability of an 
outbreak becoming an epidemic at a given T is equal to S. While this appears very natural, it 
stands nonetheless in contrast to the standard fully mixed models, for which all outbreaks give rise 
to epidemics above the epidemic transition point. 

As an example of this first simple epidemic model, consider SIR disease outbreaks taking place 
on networks having a degree distribution with the truncated power-law form 

r for = 

Pk=\ (8) 
ICyt-^e-^/"^ fork>\. 

where a and k are constants, and C is set by the requirement that the distribution be normalized. 



This distribution is seen in a number of networks in the real world [Q, ^i^d includes both pure 
power-law and pure exponential distributions as special cases. 

Substituting Eq. @ into Eq. @, we then find that our disease has an epidemic transition at 

Lia-Ke-i/-^) 



Lia-2(e-i/'^)-Lia-i(e-i/K)' 
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FIG. 1: Epidemic size (top) and average outbreak size (bottom) for the SIR model on networks with degree 
distributions of the form Eq. (||) as a function of transmissibility. Solid lines are the exact solutions, Eqs. (^ 
and (0), for a = 2 and (left to right in each panel) K = 20, 10, and 5. Each of the points is an average result 
for 10000 simulations on graphs of 100000 vertices each. The distributions P(r) and P(t) are uniform 
over the intervals < r < r^ax and 1 < X < Xmax respectively (r real, x integer), with r^ax as indicated and 

^max — 1 ... 10. 

where hin{x) is the nth polylogarithm of x. Below this transition no epidemics are possible, only 
small outbreaks having average size 

r[Lia-i(e-i/K)]2 
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Licc(e-i/K)[(r + l)Lia_i(e-i/K)_rLia-2(e-i/K)]' 
while above it, epidemics occur with size and probability S, whose value we can extract by numer- 
ical iteration of Eq. (^. 

In Fig. [l] we compare the predictions of this solution against explicit simulations of epidemics 
spreading on networks with heterogeneous transmission rates r and infectiveness times x. As the 
figure shows, agreement between analytic and numerical results is good. 

To emphasize the difference between our results and those for the equivalent fully mixed model, 
we compare the position of the epidemic threshold in the two cases. In the case a = 2, K = 10 (the 
middle curve in each frame of Fig. [1]), our analytic solution predicts that the epidemic threshold 



occurs at Tc = 0.329. The simulations agree well with this prediction, giving Tc = 0.32(2). By 
contrast, a fully mixed SIR model in which each infective individual transmits the disease to the 
same average number of others as in our network, gives a very different prediction of = 0.558. 

Although the model above is already more realistic than the standard epidemic models in 
a number of ways (network structure, heterogeneous transmission, heterogeneous infectiveness 
times), there are many ways it can be further improved. For instance, with real diseases the trans- 
mission rates r or the infectiveness times x may not be iid random variables as we have assumed; 
they may be correlated. As an example of how this can be incorporated into the model, consider the 
case where the distribution of transmission rates r depends on the degree k of the vertex represent- 
ing the infective individual. (One could imagine for example that people who have many contacts 
tend also to have more fleeting contacts, so that r would go down on average with increasing k.) 
Then the transmissibility also becomes a function of k according to = 1 — / drdTPk{r)P{x) e^'^^ 
and the generating functions become a function of the complete set {T^}. Alternatively, the distri- 
bution of r might depend on the degree of the individual being infected, which gives us a similar 
set {Uk} of transmissibilities. Or r might depend on both degrees. The correct generalization of 
the generating functions is: 

Go{x-{Tk},{Uk}) = Y,Pkii + ix-m)\ (11) 

k 

Giix;{Tk},{Uk}) = -Y,kpk[l + i{l + {x-mf-'-l)Uk\. (12) 

The cases in which transmission depends only on one degree or the other can be derived from these 
expressions by setting either = I or Uk = I for all k. Once we have the generating functions, 
then the calculation proceeds as before, with mean outbreak size below the epidemic transition 
being given by Eq. @ and epidemic size above it by Eq. (^. The epidemic transition occurs as 
beforeatG;(l;{r^},{[/^}) = l. 

Another area of current interest is models incorporating vaccination of individuals [[T^, . We 
show elsewhere ^ that models with vaccination can also be solved exactly, both in the case 
of uniform independent vaccination probability (i.e., random vaccination of a population) and in 
the case of vaccination which is correlated with properties of individuals such as their degree (so 
that vaccination can be directed at the so-called core group of the disease-carrying network — those 
with the highest degrees). 

The other main way in which we can make our models more realistic, while still retaining exact 
solvability, is to incorporate more realistic social structure into our networks. As an example, 
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consider the network by which a sexually transmitted disease is communicated, which is also 
the network of sexual partnerships between individuals. In a recent study of 2810 respondents 
Liljeros et al. [|5|] recorded the numbers of sexual partners of men and women over the course of 
a year. From their data it appears that the distributions of these numbers are power-law in form 
Pk ~ ^ " for both men and women with exponents a that fall in the range 3. 1 to 3.3. If we assume 
that the disease of interest is transmitted primarily by contacts between men and women (true only 
for some diseases), then to a good approximation the network of contacts is bipartite [pO|]. We 
define two pairs of generating functions for males and females: 

^oW = F,{x) = -Y,jpjX^-\ (13) 

j ^ j 

GoW = GiW = -£fc^fc/-i, (14) 

k ^ k 

where pj and qk are the two degree distributions and fj. and V are their means. We can then develop 
expressions similar to Eqs. (|^) and (^ for an epidemic on this new network. For instance, the 
epidemic transition takes place at the point where Ti„fTf,n = l/[Fj(l)Gj(l)] where T„^f and Tf^ 
are the transmissibilities for male-to-female and female-to-male infection respectively. 

One important result that follows immediately is that if the degree distributions are truly power- 
law in form, then there exists an epidemic transition only for a small range of values of the expo- 
nent a of the power law. Let us assume, as appears to be the case, that the exponents are roughly 
equal for men and women: ttm = CX/ = a. Then if a < 3, we find that TmfTfm = 0, which is only 
possible if at least one of the transmissibilities 7,^/ and Tf^ is zero. As long as both are positive, 
we will always be in the epidemic regime, and this would clearly be bad news. No amount of 
precautionary measures to reduce the probability of transmission would ever eradicate the disease. 
(Similar results have been seen in other types of models also [jT^, |T7]|.) Conversely, if a > ttc, 
where ttc = 3.4788 ... is the solution of ^(a — 2) = 2^(a — 1), we find that TmfTfm = 1, which 
is only possible if both Tmf and Tfm are 1. When either is less than 1 no epidemic will ever oc- 
cur, which would be good news. Only in the small intermediate region 3 < a < 3.4788 . . . does 
the model possess an epidemic transition. Interestingly, the real-world network measured by Lil- 
jeros et al. [|]] appears to fall precisely in this region, with a ~ 3.2. If true, this would be both 
good and bad news. On the bad side, it means that epidemics can occur. But on the good side, it 
means that that it is in theory possible to prevent an epidemic by reducing the probability of trans- 
mission, which is precisely what most health education campaigns attempt to do. The predicted 
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critical value of the transmissibility is ^(a — l)/[^(a — 2) — ^(a — 1)], which gives Tc = 0.363 . . . 
for a = 3.2. Epidemic behavior would cease were it possible to arrange that T„ifTfm < T^. 

Some caveats are in order here. The error bars on the values of the exponent a are quite large 
(about ±0.3 [^). Thus, assuming that the conclusion of a power-law degree distribution is correct 
in the first place, it is still possible that a < 3, putting us in the regime where there is always 
epidemic behavior regardless of the value of the transmissibility. On the other hand, it may also 
be that the distribution is not a perfect power law. Although the measured distributions do appear 
to have power-law tails, it seems likely that these tails are cut off at some point. If this is the 
case, then there will always be an epidemic transition at finite T, regardless of the value of a. 
Furthermore, if it were possible to reduce the number of partners that the most active members 
of the network have, so that the cutoff moves lower, then the epidemic threshold rises, making 
it easier to eradicate the disease. Literestingly, the fraction of individuals in the network whose 
degree need change in order to make a significant difference is quite small. At a = 3, for instance, 
a change of cutoff from K = oo to K = 100 affects only 1.3% of the population, but increases the 
epidemic threshold from = to Tc = 0.52. In other words, targeting preventative efforts at 
changing the behavior of the most active members of the network may be a much more promising 
way of preventing the spread of disease than targeting everyone. (This suggestion is certainly not 
new, but our models provide a quantitative basis for assessing its efficacy.) 

Another application of the techniques presented here is described in Ref. ^ In that paper we 
model in detail the spread of walking pneumonia (Mycoplasma pneumoniae) in a closed setting 
(a hospital) for which network data are available from observation of an actual outbreak. In this 
example, our exact solutions agree well both with simulations and with data from the outbreak 
studied. Furthermore, examination of the analytic solution allows us to make specific suggestions 
about possible new control strategies for M. pneumoniae infections in settings of this type. 

Applications of the techniques described here are also possible for networks specific to many 
other settings, and hold promise for the better understanding of the role that the structure of contact 
networks plays in the spread of disease. 

The author thanks Lauren Ancel, Laszio Barabasi, Duncan Callaway, Michelle Girvan, and Catherine 
Macken for useful comments. This work was supported in part by the National Science Foundation under 
grant number DMS-0109086. 
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